File System Workload Analysis For Large Scale Scientific Computing Applications

نویسندگان

  • Feng Wang
  • Qin Xin
  • Bo Hong
  • Scott A. Brandt
  • Ethan L. Miller
  • Darrell D. E. Long
  • Tyce T. McLarty
چکیده

Parallel scientific applications require high-performance I/O support from underlying file systems. A comprehensive understanding of the expected workload is therefore essential for the design of high-performance parallel file systems. We re-examine the workload characteristics in parallel computing environments in the light of recent technology advances and new applications. We analyze application traces from a cluster with hundreds of nodes. On average, each application has only one or two typical request sizes. Large requests from several hundred kilobytes to several megabytes are very common. Although in some applications, small requests account for more than 90% of all requests, almost all of the I/O data are transferred by large requests. All of these applications show bursty access patterns. More than 65% of write requests have inter-arrival times within one millisecond in most applications. By running the same benchmark on different file models, we also find that the write throughput of using an individual output file for each node exceeds that of using a shared file for all nodes by a factor of 5. This indicates that current file systems are not well optimized for file sharing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

File System Workload Analysis For Large Scientific Computing Applications

Parallel scientific applications require high-performance I/O support from underlying file systems. A comprehensive understanding of the expected workload is therefore essential for the design of high-performance parallel file systems. We re-examine the workload characteristics in parallel computing environments in the light of recent technology advances and new applications. We analyze applica...

متن کامل

E2DR: Energy Efficient Data Replication in Data Grid

Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

A Case Study: Performance Analysis and Optimization of SAS® Grid Computing Scaling on a Shared Storage

SAS® Grid Computing is a scale-out SAS® solution that enables SAS applications to better utilize I/O and compute intensive computing resources. This requires the use of high-performance shared storage (SS) that allows all servers to access the same file system. SS may be implemented via traditional NFS NAS or clustered file systems (CFS) like GPFS. This paper uses the Intel® Enterprise Edition ...

متن کامل

A Metadata Workload Generator for Data-Intensive File Systems

Large-scale data-intensive computing [2, 3] has posed numerous challenges to the underlying distributed file system, due to the unprecedented amount of data, the large number of users, the intense competition on cost and service quality, and the emergence of new applications. As a result, there has been an increasing amount of research on scalable metadata management [4, 6], high availability [...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004